Abstract: Apache Hive could be a wide used information reposition and analysis tool. Developers write SQL like HIVE queries, that are regenerate into MapReduce programs to runs on a cluster. Despite its quality, there's very little analysis on performance comparison and diagnose. Part of the explanation is that instrumentation techniques accustomed monitor execution can not be applied to intermediate MapReduce code generated from Hive question. as a result of the generated MapReduce code is hidden from developers, run time logs are the sole places a developer will get a glimpse of the particular execution. Having an automatic tool to extract info and to come up with report from logs is crucial to know the query execution behavior. The designed a tool to make the execution profile of individual Hive queries by extracting info from HIVE and Hadoop logs. The profile consists of elaborated info regarding MapReduce jobs, tasks and tries happiness to a question. it's keep as a JSON document in MongoDB and might be retrieved to come up with reports in charts or tables. I have run many experiments on AWS with TPC-H knowledge sets and queries to demonstrate that our identification tool is ready to help developers in examination HIVE queries written in numerous formats, running on completely different knowledge sets and organized with different parameters. it's additionally ready to compare tasks/attempts inside constant job to diagnose performance problems.
Keywords: MapReduce, HIVE, Hadoop, JSON.